NLP是將人類日常溝通所使用的語言轉換為計算機能理解和處理的語言技術。NLP廣泛應用於各種文本處理任務中,相關應用如下:
自動將文本分類成不同的類別,如:新聞分類、垃圾郵件分類、網路文章分類等
自動提取文本中的關鍵信息,包括:人名、地名、日期等。
提取文本核心訊息,自動生成文本重點摘要。
自動生成自然語言文本,自動生成企劃文案、自動生成聊天訊息等。
將一段話或文章分解成小的單位,稱為token,這些tokens通常是字詞或標點符號。在不同語言中,斷詞的方式可能不同,如中文的斷詞需要特定的算法來正確識別字詞邊界。
為每個token標註其對應的詞性,如名詞、動詞、形容詞等,以利後續的文本分析。POS Tagging對語法分析、語義理解和NLG等任務至關重要,能夠幫助模型更準確地理解句子結構。
在特定應用場景和領域中,將具有特別意義的token標註出來,以利於文本分析。常見的NER類別包括:人名、地名、事件、日期等,這些信息在信息檢索、數據挖掘等領域具有重要作用。
將詞語轉換成向量的形式,以捕捉詞語之間的相關性。在高維度空間中,相似的單詞有較近的向量距離。「Similarity Search」 就是基於Word embedding來實現的。Word embedding廣泛應用於情感分析、文本分類等任務中
程式碼如下:
from transformers import pipeline
# 加載預訓練的文本分類模型
classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
# 分析文本
result = classifier("Kevin Durant is my GOAT")
print(result)
輸出結果如下:
[{'label': 'POSITIVE', 'score': 0.9685606360435486}]
程式碼如下:
from transformers import pipeline
# 加載預訓練的文本摘要模型
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
# 生成摘要
article = """
In 2007, the Seattle SuperSonics selected Kevin Durant with the second overall pick.
Though the Sonics struggled as a team, Kevin’s brilliance shone through.
He was named Rookie of the Year, and it became clear that he was destined for greatness.
However, the following year, the Sonics relocated to Oklahoma City, becoming the Thunder, and Kevin’s journey took a new turn.
In Oklahoma City, Kevin Durant flourished.
He formed a dynamic duo with another young star, Russell Westbrook, and together, they led the Thunder to the playoffs year after year.
Kevin’s scoring ability was unmatched, and he won four scoring titles in five years.
But despite their regular-season success, the Thunder struggled to capture an NBA championship.
They reached the Finals in 2012 but fell short to the Miami Heat, led by LeBron James.
"""
summary = summarizer(article, max_length=50, min_length=25, do_sample=False)
print(summary)
輸出結果如下:
[{'summary_text': 'In 2007, the Seattle SuperSonics selected Kevin Durant with the second overall pick. The following year, the Sonics relocated to Oklahoma City, becoming the Thunder. He formed a dynamic duo with another young star, Russell Westbrook, and together, they led the Thunder to the playoffs year after year.'}]
現在才看或許有點晚,但排球少年真的超熱血🔥
[自然語言處理] #3 命名實體標註 Name Entity Recognition 理論設計篇